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Abstract: Completing a PhD on time is a complex process, influenced by many 
interacting factors. In this paper we take a Bayesian Network approach to 
analyzing the factors perceived to be important in achieving this aim. Focusing 
on a single research group in Mathematical Sciences, we develop a conceptual 
model to describe the factors considered to be important to students and then 
quantify the network based on five individual perspectives: the students, a 
supervisor and a university research students centre manager. The resultant 
network comprised 37 factors and 40 connections, with an ovemll probability of 
timely completion of between 0.6 and 0.8. Across all participants, the four factors 
that were considered to most directly influence timely completion were personal 
aspects, the research environment, the research project, and incoming skills. 
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Timely completion of a PhD is an important outcome for the student, the host university and the 
economy. However, completion of this programme in the required timeframe is dependent on 
many interacting factors. In this study, we develop a statistical complex systems approach to 
identify and quantify the important factors and their interactions that are perceived to impact on 
timely completion of a PhD in Statistics in an Australian university. We define timely 
completion to be within 3.5 years. We construct a Bayesian Network (BN) to describe these 
inter-relationships (Pearl, 1985); the construction and interpretation of a BN is described in more 
detail below. The conceptual model for the BN was developed collectively and then quantified 
by five candidates: three students, a supervisor and a university research students centre 
manager. 

Australian universities receive competitive funding for PhD enrolments and successful 
completions, yet completion rates are well below 100% (Jiranek, 2010). It is therefore of interest 
to institutions and government bodies if predictive or causal factors can be identified which may 
assist students to progress through their studies, or to better prepare for and support the 
postgraduate supervision of students. Gaining an understanding of factors affecting timely 
completion, and providing such information to prospective students to better equip them, could 
assist with attrition rates. 

We were interested in three main questions. First, what is the overall perceived 
probability of timely completion of a PhD in Statistics at QUT? Second, what factors were most 
influential in timely completion, and how do these differ between the five candidates? Third, 
what is the change in the probability of timely completion under specified scenarios? The 
scenarios chosen for evaluation are detailed below. 
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I. Background. 

There is a substantial and growing literature that identifies important factors associated with 
completion of a PhD research project. In a meta-analysis based on over 160 references, Bair and 
Hanworth (2005) associate persistence rates with funding and socialization, and completion rates 
with positive and supportive mentor relationships. Maher, Ford and Thompson (2004) list a suite 
of factors frequently linked to completion time of doctoral degrees, including availability of 
funding resources, the nature of the advising relationship, the extent to which students receive 
research preparation and opportunities, and individual student concerns about marital, family or 
health problems. Seagram, Gould and Pyke (1998) also list several potential factors that impact 
on timely completion, including gender, discipline, supportive relationship, financial situation 
and enrolment status. A linear regression analysis of the results of a survey of 154 graduates of 
doctoral programs in three discipline areas at York University revealed that beginning a 
dissertation early in the program, remaining with the original topic and supervisor, meeting 
frequently with the supervisor and collaborating with supervisor on conference papers were 
important indicators, but only explained 30% of the total variance. 

The role of supervisors has also been examined by other authors; see, for example, Zhao 
(2007). A Procrastination Inventory proposed by Muszynski and Akamatsu (1991) revealed that 
demographic and situational variables, including a supportive advisor, finding a topic of interest, 
making the dissertation a top priority and living close to the university were predictive of 
success, but that specific research interests or measures of needs or values were not significant 
predictors. Psychological factors have also been investigated by other authors; see, for example, 
Kearnes, Gardiner and Marshall (2009) who focused on the important issue of self-sabotaging 
behaviour due to over-committing, procrastination and perfectionism. 

Another important domain that has been considered in the literature is the role of cohort 
partnerships and groups (Witte & James, 1998) and peer-to-peer support (Devenish, Dyer, & 
Jefferson, 2009). Race (Ellis, 2001), type of attendance (Rodwell, 2008) and gender (Maher, 
Ford, & Thompson, 2004) have also been discussed. 

A variety of perspectives about the issue of timely completion have also been considered. 
Barnes and Austin (2009) considered the role of doctoral advisors from the advisors’ perspective. 
Isaac, Quinlan and Walker (1992) have examined faculty perceptions of the doctoral dissertation, 
noting in particular field-related differences with respect to characteristics, content and purpose 
of the doctoral dissertation. The impact of departmental factors has also been identified by other 
authors; see, for example, de Valero (2001). 

The importance of this topic and the intense interest in it is underscored by the large, high 
profile Council of Graduate Schools Ph.D. Completion Program (2009), conducted in the USA 
and Canada, and the citations and references therein. The study profiles the following key factors 
influencing PhD completion: selection, mentoring, financial support, program environment, 
research mode of the field, and processes and procedures. 

There is now a large literature on the underpinning theory and methodology of BNs as 
well as their application to a wide range of problems. We have previously employed them to 
address environmental and health outcomes (e.g., Johnson et al., 2009, 2010; Waterhouse et al., 
2010), among other areas. They have also been used for over a decade in the education field; see 
for example the student models of Millan et al. (2010) and Carmona et al. (2008), models for 
assessing diagnostic performance considered by Almond et al. (2007) the general discussion of 
BNs in educational assessment by Mislevy et al. (2000), and the references therein. 
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In this study, we focus on a single discipline area, Statistics, in the Mathematical Sciences 
Discipline at Queensland University of Technology (QUT), Australia. This focus is based on the 
findings of Seagram et al. (1998), Muszynski and Akamatsu (1991), Isaac et al. (1992) and de 
Valero (2001), among others, that there are discipline-related and institutional differences in PhD 
completion time itself, and the factors that potentially impact on it. 

II. Methods. 

A. Bayesian Networks. 

The first step in constructing a Bayesian Network is the development of a conceptual model of 
the factors and their interactions. This is depicted as a graphical model, or network, of nodes 
(representing the factors) and directed arrows (representing the interactions between the nodes). 
The final outcome (timely PhD completion) is called the terminal node. 

The second step of the BN typically involves categorizing each node into a (small) 
number of states, for example high/low, 0-10/10-20/20+, good/medium/bad. The thresholds for 
the states are chosen to be meaningful in the context of the problem. 

In the third and last step of the construction of the BN, each node is quantified by 
attaching probabilities to each state of the node. The probabilities are conditional on the states of 
the nodes feeding into it (as determined by the directed arrows in the network). 

A characteristic of the BN is that quantification of a node in the BN depends only on a 
subset of the network. Thus the whole problem is collapsed into a series of local analyses. 
Moreover, a variety of sources can be used for quantification of a node, including data, 
simulation models, statistical or mathematical models, results from literature or previous studies, 
expert knowledge, and so on. This ability to integrate diverse data is arguably one of the 
strengths of the BN approach. An iterative approach to designing a BN is described by Johnson 
et al. (2010). 

Once completed, the conditional probabilities ‘flow through’ the BN to provide an overall 
probability for each level of the terminal (outcome) node. The network can then be interrogated 
to identify the major factors influencing the outcome. Moreover, it can be employed to assess the 
impact of ‘evidence’ and evaluate scenarios, where these are represented by setting one or 
several of the nodes in the BN to specified levels. 

B. Conceptual BN model. 

The structure of the Bayesian Network was developed during a series of meetings with a focus 
group comprising postgraduate students in Statistics at QUT from December 2010 to January 
2011. The focus group comprised 10 unincentivised volunteers, representing approximately 25% 
of all postgraduates enrolled in the Discipline at the time. While this sample was not 
probabilistically drawn, it was broadly representative with respect to personal demographics 
(age, gender, cultural background) and stage of completion. Based on the focus group meetings, 
a list of all possible factors was created, then those that were similar were merged and those that 
were deemed to be beyond the scope of the study, namely were removed. Factors were then 
classified into groups, which became the nodes of the network. Each of these nodes was then 
assigned binary states and operational definitions (Table 1). These factors related to external 
political and financial environments, including the following: government attitudes to higher 
education, government funding for postgraduate students, global financial status, national 
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financial status. Subjects were unable to quantify the impact of environments other than the 
current one, based on their own experience. 

C. Quantification. 

The conceptual BN was translated to the software package Genie for probabilistic quantification. 
Five participants quantified the BN model. The first (Al) was a former domestic doctoral 
student, the second (A2) a current domestic PhD student, and the third (A3) a current 
international student. The fourth participant (Bl) was a supervisor of these doctoral students, and 
the fifth (Cl) was the manager of the university research students’ centre. 

The network was quantified by each participant independently, with guidance from two 
of the coauthors. The guidance provided was in the form of a structured statement describing a 
Bayesian Network, giving definitions of the nodes, and providing an example of how to 
complete the required conditional probability tables. The statement was provided to all 
participants. 

Participants were taken through each external node and asked to quantify the probability 
of each node being in the positive state. For internal nodes, participants were asked to complete 
the underlying conditional probability tables. For illustration, an example question that was 
asked of a subject in order to quantify the network is as follows: ‘if the factors that directly 
influence this node are all conducive, what is the probability of this factor being conducive”. As 
previously stated, the definitions of relevant nodes and states (conducive, not conducive) had 
been defined for the subject. The probabilities provided by the subject were then confirmed 
through statements such as, ‘this value would indicate that in x out of 10 times, or for x out of 10 
students, this factor would be conducive, given that all of the input nodes are conducive’. Similar 
questions were asked for the other combinations of states making up the conditional probability 
tables. The subject was then invited to evaluate the full set of probabilities for consistency and 
relative magnitude. This process was repeated for all nodes in the network. Where subjects found 
this process difficult, they were alternatively asked to weight the importance of each of the input 
nodes. The weights were then standardized to equal 1, and used as coefficients in a linear 
regression with indicator variables representing the input nodes. The outputs of the regression 
model were used as inputs into the conditional probability tables for the node under 
consideration. 

D. Analysis and Interrogation. 

Final probabilities depicted in the output node were recorded as representations of each 
participant’s perceived probability of timely completion. Internal nodes feeding directly into the 
model were independently interrogated to determine their effect on the stated probability in the 
output node. Finally, each node was interrogated independently to determine its final effect on 
the output node in order to determine any unexpected effects. 

III. Results. 

A. Overall network structure. 

Figure 1 depicts the conceptual BN model developed in this study. The network includes 37 
nodes and 40 connections, indicating that three nodes in the network connect to two other nodes 
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each. Four internal nodes feed directly into the outcome node, each with their own network of 
factors influencing their state. Table 1 provides a full list of all included nodes with their possible 
states and operational definitions. 
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Table 1 . Structure nodes, s tates and operational defini ti ons. 


Node 

Levels 

Definition 

Time 

Management 

Skills^ 

Adequate/ 

Inadequate 

The ability of the student to plan and prioritise tasks to meet deadlines 
set by the university or supervisors. 

Discipline 

Expertise 

Adequate/ 

Inadequate 

The knowledge of the student regarding their discipline at the time of 
enrolment. 

Math 

Adequate/ 

Inadequate 

The student's general ability to understand and use mathematical 
logic. 

Writing 

Adequate/ 

Inadequate 

The student's general ability to clearly communicate their thoughts in 
writing. 

English 

First Language/ 

Not First 

Language 

The student's general level of skill with the English language 

Incoming Skills 

Adequate/ 

Inadequate 

The research and management skills of the student at the time of 
enrolment. This is broadly defined as English, Writing, Math, 

Discipline Expertise and Time Management Skills 

Domestic 

Circumstance 

Conducive/ Non- 
conducive 

The living arrangement of the student. This may vary, but whether it 
is conducive depends on the student. 

Emotional State 

Positive/ 

Negative 

How the student feels about life in general at any period during their 
degree. 

Continuity of 
Study 

Conducive/ Non- 
conducive 

Whether the student is returning to study after a period of time, or is 
continuing on directly after a different degree. 

Personal 

circumstance 

Adequate/ 

Inadequate 

The family and social circumstances of the student. Broadly defined 
as the continuity of study, emotional state and domestic 
circumstances 

Financial 

circumstance 

Adequate/ 

Inadequate 

The financial position of the student. This is defined as their ability to 
meet their financial obligations. 

Attitude 

Conducive/ Non- 
conducive 

The student's perspective of how to approach challenges relating to 
their degree. 

Personal 

Aspects 

Conducive/ Non- 
conducive 

The collection of all factors related to a student's non-academic life. 
These are broadly defined as Attitude, Financial Circumstance and 
Personal Circumstance 

PhD Students 

Useful/ Not 

Useful 

The presence and helpfulness of other PhD students. This might 
include their ability to resolve academic, administration or personal 
issues. 

Researchers 

Useful/ Not 

Useful 

The presence and helpfulness of relevant researchers. This might 
include their ability to resolve academic, administration or personal 
issues. 

Peers 

Useful/ Not 

Useful 

The presence and helpfulness of other PhD students and Researchers 
collectively. This might include their ability to resolve academic, 
administration or personal issues. 

Enrolment 

Full Time/ Part 
Time 

Whether the student is enrolled full time or part time. A full time load 
is 20 hours per week, whereas a part time load is 10 hours per week 

Study Location 

Internal/ External 

Whether the student is based on or off campus. This is defined by 
whether they have a designated workspace on the University campus. 

Research 

Environment 

Conducive/ Non- 
conducive 

The general culture of research and physical environment in which 
the student exists. This might include whether the student is 
encouraged to attend conferences, or whether the campus (or home if 
the student studies externally) is safe and comfortable to work in 

Library Access 

Adequate/ 

Inadequate 

The resources and access provided by the University Library. This 
would include books and journal subscriptions, and access to outside 
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libraries. 

Physical 

Adequate/ 

Inadequate 

The physical resources of the University, such as car parks, lecture 
halls and study space. 

Computer 

Access 

Adequate/ 

Inadequate 

The availability and appropriateness of computer -based resources 
and assistance. This includes physical hardware such as desktop and 
laptop computers as well as software licences. 

General 

Research 

Experience 

Adequate/ 

Inadequate 

The supervisor's previous experience in academic research at the time 
of enrolment. This could be defined by the number of publications 
produced or the length of time actively involved in research. 

Resources 

Available 

Adequate/ 

Inadequate 

The general availability of resources related to the completion of a 
Research Higher Degree. This is broadly defined as Library Access, 
Physical resources, Computer access and General Research 

Experience. 

Interest 

High/ Low 

The student's interest in their thesis topic 

Written 

Research type 

Publication/ 
Standard Report 

The type of thesis submission the student nominates. Publication 
required that all sections of the thesis consist of published papers, 
while a standard report is approved by a panel. 

Expertise in 
topic 

Adequate/ 

Inadequate 

The knowledge of the supervisor regarding their expertise in the 
specific thesis subject at the time of enrolment. This might be 
determined by number of papers published on the subject, or length of 
time spent researching the substantive area. 

Student History 

Mostly 

Successful/ 

Mostly 

Unsuccessful 

The success record of the supervisor regarding previous postgraduate 
students. This is determined by the number of students completing on 
time divided by the number of students supervised. 

Access 

Adequate/ 

Inadequate 

The availability of the supervisor for meetings, comments and 
feedback. This is determined largely by the student's need to access 
the supervisor. 

Supervision 

Experience 

Adequate/ 

Inadequate 

The experience of the supervisor with supervising postgraduate 
students. This may be judged by the number of students previously 
supervised or the length of time spent actively supervising students. 

Student- 

supervisor 

history 

Positive/ 

Negative 

The relationship and history between the student and supervisor prior 
to enrolment. This may include any personal or academic 
relationships within or without the context of the research higher 
degree. 

Supervisor 

Helpful/ Not 
Helpful 

The helpfulness and timeliness of the supervisors comments and 
feedback. This may be judged by the comprehensiveness, relevance 
and correctness of comments. 

Research Niche 

Specific/ General 

The specificity of the student's chosen thesis topic. This may be 
determined by the number of substantive areas in which the student 
considers their work relevant or the breadth of literature review (as 
judged by the number of publications and journals included) required 
to establish a theoretical base. 

Previous 

Experience 

Adequate/ 

Inadequate 

The student's previous experience with their research topic. This may 
include study in the area, but may also include relevant research or 
industry roles previously held by the student. 

Topic 

Conducive/ Non- 
conducive 

The topic of the student's thesis in relation to their experience. This is 
broadly defined as the Research Niche of the thesis and Previous 
Experience of the student 

Research 

Project 

Conducive/ Non- 
conducive 

All aspects of the student's degree related to the specifics of their 
research project. This is broadly defined as their Interest, Written 
Research Type, Supervisor and Topic. 
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B. Overall probability of timely completion. 


The output showed a perceived probability of timely PhD completion in Statistics at QUT 
ranging from 68% to 79% (Table 2). Amongst students, the domestic current student perceived 
the highest probability of timely completion (79%), followed by the current international student. 
The supervisor perceived the second lowest probability of timely completion (70%), and the 
former domestic student perceived the lowest probability (68%). The research manager (Cl) held 
the most optimistic overall view of the probability of timely completion (80%); although this was 
still within keeping with the other estimates, it was higher by 10% than the supervisor (Bl). 


Table 2. Final outcome probability of timely completion of a PhD based upon user beliefs 


Network 

A1 

A2 

A3 

Bl 

Cl 

Probability 

0.68 

0.79 

0.72 

0.70 

0.80 


C. Most influential factors. 

The most influential factors were found to be those feeding directly into the terminal node 
(timely completion). Results of the interrogation of these nodes are presented in Table 3 and 
depicted as radar plots in Figure 2. 

These analyses revealed that all four factors contribute substantially to the probability of 
timely completion. Moreover, while low levels of one or two of the identified factors can deplete 
the probability to around 0.5 (a 50/50 chance of timely completion), there is almost unanimous 
agreement that low levels of more than two factors reduces this probability to less than 0.5. 

The largest differences in the probabilities awarded to the different combinations of 
factors were observed between the supervisor (Bl) and research manager (Cl). Compared with 
the research manager, the supervisor showed much greater concern about timely completion for 
low levels of the research project, either alone when all other factors were at high levels, or with 
low levels of personal aspects when the other two factors were at high levels. In contrast, the 
research manager showed greater concern than the supervisor when the research environment 
was a low level, either alone with all other factors were at high levels, or paired with low levels 
of incoming skills and/or personal aspects. 

The strength of influence of the different factors was also evaluated for each respondent. 
All three students and the supervisor identified availability of resources and presence of other 
researchers or PhD students. In addition, the former domestic student identified the research 
topic, and the current domestic identified attitude, financial and personal circumstances. The 
PhD supervisor also identified the importance of attitude, emotional state, maths background, 
previous experience, the research topic and the student-supervisor relationship. While the BN 
constructed by the research manager similarly revealed the importance of other researchers, other 
PhD students and the candidate’s attitude, it also highlighted continuity of study, previous 
experience and the research niche. 
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Table 3. Relative influence of direct internal nodes on outcome of interest (timely completion), 
scaled to range between 0 and 1. 


Personal 

Aspects 

Research 

Environment 

Research 

Project 

Incoming 

Skills 

Al 

A2 

A3 

Bl 

Cl 

High 

High 

High 

High 

1.0 

1.0 

1.0 

1.0 

1.0 




Low 

0.8 

0.8 

0.8 

0.8 

0.7 



Low 

High 

0.7 

0.7 

0.7 

0.6 

0.9 




Low 

0.6 

0.5 

0.5 

0.4 

0.6 


Low 

High 

High 

0.7 

0.7 

0.8 

0.8 

0.5 




Low 

0.6 

0.5 

0.5 

0.6 

0.3 



Low 

High 

0.6 

0.5 

0.5 

0.3 

0.5 




Low 

0.3 

0.3 

0.3 

0.2 

0.2 

Low 

High 

High 

High 

0.7 

0.7 

0.7 

0.8 

0.8 




Low 

0.6 

0.5 

0.5 

0.5 

0.5 



Low 

High 

0.5 

0.5 

0.5 

0.4 

0.7 




Low 

0.3 

0.3 

0.2 

0.2 

0.4 


Low 

High 

High 

0.5 

0.5 

0.5 

0.6 

0.3 




Low 

0.7 

0.2 

0.3 

0.4 

0.1 



Low 

High 

0.2 

0.3 

0.2 

0.1 

0.2 




Low 

0.0 

0.0 

0.0 

0.0 

0.0 


IV. Discussion. 

This BN analysis revealed the following answers to the three main questions posed in this study. 
First, despite their different perspectives, there was general agreement among the participants in 
our study that the overall likelihood of timely PhD completion was around 0.7 to 0.8; that is, that 
on average just under one student in four will not graduate within the given period. The current 
domestic student (A2) rated their probability of timely completion as the highest, followed the 
current international student (A3) and the supervisor (Bl). The former domestic PhD student 
(Al) was the most pessimistic about timely completion. 

Across all participants, the Research Project was the most important factor impacting on 
timely completion, followed by the Research Environment. Interestingly, Incoming Skills and 
Personal Aspects were judged to be equally the least important. 

Of course, it is not possible to make any general statements or inferences based on this 
small sample. However, the study does highlight that students and engaged staff can indeed 
develop a complex systems model for timely PhD completion, and then quantify it based on their 
expert judgment. The study also demonstrates that the quantitative outputs are useful for 
answering questions about PhD completion, including the likelihood of timely completion and 
the impact of factors contributing to this outcome. Finally, the outputs of a BN can facilitate 
understanding and decision-making about PhD matters by students, supervisors and university 
management. 

Bayesian Networks based on a person’s opinion are difficult to validate externally. 
Internal validation can proceed via cross-referencing of probabilities, inspection of consistency 
of probability statements in sub-nodes, and so on. However, nodes like a person’s emotional 
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state cannot be objectively measured. Notwithstanding this, these are important factors, and a BN 
approach allows these to be included and quantified at a high level. 




Figure 2. Radar plots of relative influence of factors directly influencing the target outcome 
(timely completion); probabilities are as displayed in Table 1. Factors are Personal Aspects, 
Research Environment, Research Project and Incoming Skills. Hence ‘HHHH’ refers to high 
levels of all factors, HLLL refers to high level of Personal Aspects and low levels of other 
factors, and so on. 

It is noted that the study reported in this paper has focused on factors perceived to be 
important contributors to timely PhD completion and consequently has provided perceived 
probabilities of completion through the Bayesian Network analysis. These perceptions could lay 
the groundwork for further modeling of actual factors and completion rates, and the parallels 
between the perceived and actual Networks. There were three reasons why this was not pursued 
as part of the present study. First, perceptions are important in their own right, since they lead to 
a deeper understanding of the human aspects of the problem and can thus contribute strongly to 
behavioural and management change frameworks. Second, not all of the factors identified in the 
study have unequivocal objective metrics that are routinely collected by Universities. Third, 
confidentiality concerns constrained a more objective analysis, particularly for the defined group 
of interest in this study. This motivates a larger future study that would address all three of these 
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issues. Such a study could comprise students and supervisors from a wider range of disciplines, 
to both generate the network structure and quantify them. 

Almost 30 years ago, Abedi and Benkin (1984) described research into reasons 
contributing to timely completion of degrees as “charitably sparse” (p.4). Twenty years later, 
Maher, Ford and Thompson (2004) argued that empirical research in this field could still be 
described as such. There has been considerable literature on the topic in the intervening years, 
and it is hoped that the present study contributes to our growing understanding of timely 
completion as a complex system. 
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